Exclusion criteria

Sample counts for exclusion criteria
Exclusion criterion Yes No
Cancer status (within one year prior to baseline) 10334 492064
CHD 22937 479461
Cirrhosis 1780 500618
Diabetes 26395 476003
Pregnant 150 502248
Sample sizes after exclusions
N
446808

Biomarker preprocessing

Cardiometabolic blood biomarkers measured in the UK Biobank
Biomarker Abbreviation Group UKB Field Log-transformed
Alanine aminotransferase ALT Liver 30620 Yes
Apolipoprotein B ApoB Lipoproteins 30640 No
High-sensitivity C-reactive protein hsCRP Inflammation 30710 Yes
Total cholesterol TC Lipids 30690 No
Random glucose RG Glycemic 30740 No
Glycated hemoglobin HbA1c Glycemic 30750 No
High-density lipoprotein cholesterol HDL-C Lipids 30760 No
Low-density lipoprotein cholesterol LDL-C Lipids 30780 No
Triglycerides TG Lipids 30870 Yes
Vitamin D VitD Other 30890 No
Systolic blood pressure SBP Blood pressure 4080 No
Diastolic blood pressure DBP Blood pressure 4079 No
Aggregate Index of Systemic Inflammation AISI Inflammation Aggregate Yes
Systemic Immune-Inflammation Index SII Inflammation Aggregate Yes
Systemic Inflammation Response Index SIRI Inflammation Aggregate Yes

The following QC and preprocessing steps were performed on raw blood biomarker data from the main assessment center visit:

  1. Adjust for statin usage where appropriate (TC, LDL-C, and ApoB)
  2. Log-transform for substantially non-normal biomarkers (ALT, TG, hsCRP, CBC-based inflammatory indices)
  3. Winsorize biomarker distributions at 5 SDs from the mean.

Note: These preprocessing steps were undertaken in the full dataset (prior to removing related individuals).

Biomarker distributions

Dashed lines in raw biomarker histograms denote the limits of the analytical range for the associated test (as provided by UKB).

Test the 5 SD outlier removal threshold

Threshold (# SDs) # samples removed
3 5372
4 2668
5 1473
6 909
7 597
Threshold (# SDs) # samples removed
3 6017
4 2479
5 1114
6 507
7 220

Diet preprocessing

Dietary data came from online 24-hour dietary questionnaires, collected at anywhere from one to five timepoints including the assessment center visit (for the final ~70k participants) and online between approximately Feb 2011 - Apr 2012.

The following QC and preprocessing steps were performed:

  1. Exclude questionnaires having reported daily caloric intake <600 or >4800 kcals/day
  2. Exclude questionnaires for which the participant reported that dietary intake that day was not typical (UKB field 100020)
  3. Take the mean for each basic dietary variable (single foods/nutrients) over all questionnaires returned by a given participant
  4. Calculate derived values (for example, the MUFA:SFA ratio)
  5. Winsorize all diet quantities (including derived values) at 3 SDs from the mean
  6. Calculate a 9-item Mediterranean diet score (MDS) based on the method described by Carter et al. 2019 J. Nutr. Sci.. This score assigns one point for intakes below (unhealthy) or above (healthy) the median population value (other than alcohol, which is assigned based on specific intake thresholds).
Mediterranean diet score components
MDS component Abbreviation Category Threshold
Vegetables VEG Healthy Greater than median
Legumes LEGUMES Healthy Greater than median
Fruit FRUIT Healthy Greater than median
Nuts NUTS Healthy Greater than median
Fish FISH Healthy Greater than median
Whole grains WHGRAIN Healthy Greater than median
MUFA-to-SFA ratio MUFA2SFA Healthy Greater than median
Red and processed meat REDPRMEAT Unhealthy Less than median
Alcohol ALC Mixed >5 and <25 g/day
Sample sizes available with dietary data
# recalls completed N
1 65971
2 38323
3 26206
4 12351
5 1725

Diet-biomarker relationships

We can use basic linear regressions of biomarker values on dietary variables (MDS and its individual components) to prioritize particular diet-biomarker pairs for further exploration. We will also include a summary variable incorporating signal from all cardiometabolic biomarkers (the first principal component from a PCA [centered and scaled] on all biomarkers).

Influence of mixed models

Does use of a mixed model, with a random effect governed by a diet-based covariance matrix, change effect estimates or precision compared to either unadjusted models or models using dietary fixed effects?

## # A tibble: 1 × 5
##   term  estimate std.error statistic   p.value
##   <chr>    <dbl>     <dbl>     <dbl>     <dbl>
## 1 FISH   -0.0261   0.00633     -4.13 0.0000371
## # A tibble: 1 × 5
##   term     estimate std.error statistic  p.value
##   <chr>       <dbl>     <dbl>     <dbl>    <dbl>
## 1 FISH_bin   -0.129   0.00938     -13.8 4.97e-43
## # A tibble: 1 × 5
##   term      estimate std.error statistic  p.value
##   <chr>        <dbl>     <dbl>     <dbl>    <dbl>
## 1 oily_fish   -0.247    0.0200     -12.3 7.39e-35
## # A tibble: 1 × 5
##   term         estimate std.error statistic p.value
##   <chr>           <dbl>     <dbl>     <dbl>   <dbl>
## 1 nonoily_fish  -0.0323    0.0216     -1.49   0.135
## # A tibble: 1 × 5
##   term     estimate std.error statistic p.value
##   <chr>       <dbl>     <dbl>     <dbl>   <dbl>
## 1 fish_oil  -0.0310    0.0104     -2.99 0.00278